[GH-2608] Fix RasterUDT JSON schema serialization for Delta/Parquet write#2636
Merged
Conversation
RasterUDT is a Scala case object whose getClass.getName returns 'RasterUDT$' with a trailing $ sign. Spark's UserDefinedType.jsonValue uses this class name in the JSON schema. When Delta/Parquet tries to reconstruct the UDT via Class.forName(...).getConstructor().newInstance(), it fails because the singleton object's constructor is private. This fix adds a jsonValue override (identical to the existing fix in GeometryUDT and GeographyUDT) that strips the trailing $ from the class name, allowing correct round-trip serialization. Closes #2608 Closes #2347
cb6c34c to
5fdb559
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Did you read the Contributor Guide?
Is this PR related to a ticket?
[GH-XXX] my subject. Closes RasterUDT Failing to Write to Delta Format; But works with output of RS_Union_Aggr #2608 Closes Raster Data Types in dataframe columns #2347What changes were proposed in this PR?
Add a
jsonValueoverride toRasterUDTthat strips the trailing$from the Scala case object class name, fixing Delta Lake and Parquet write failures.RasterUDTis defined as a Scalacase object, sogetClass.getNamereturnsorg.apache.spark.sql.sedona_sql.UDT.RasterUDT$(with a trailing$). Spark'sUserDefinedType.jsonValuestores this class name in the JSON schema. When Delta Lake or Parquet tries to reconstruct the UDT during deserialization viaClass.forName(...).getConstructor().newInstance(), it fails with:NoSuchMethodException: RasterUDT$.<init>()(JSON schema round-trip)UNSUPPORTED_DATATYPEreferencingRasterUDT$(Parquet/Delta write)This is the same issue that was previously fixed in
GeometryUDTandGeographyUDT.Note:
RS_Union_Aggrwas not affected because it usesExpressionEncoderresolved viaUDTRegistration, which storesclassOf[RasterUDT].getName(without the$suffix). Other raster functions (e.g.,RS_MakeEmptyRaster) useInferredExpressionwhich references thecase objectsingleton directly.How was this patch tested?
Added 3 new tests to
RasterUDTSuite:DataType.fromJson, and verifies round-trip equalityRS_MakeEmptyRaster, writes to Parquet, reads it back, and verifies schema and row countRS_Union_Aggroutput can be written to and read from Parquet (this already worked before the fix, serving as a control test)All tests pass after the fix. Tests 1 and 2 fail without the fix, reproducing the reported issue.
Did this PR include necessary documentation updates?